Update on AI in the Clinic

It seems there is an endless stream of artificial intelligence (AI) news coming out, and this includes the field of medicine. There also continues to be a debate about the true impact of AI – how much is hype, and how much is a genuine advance that can transform our technology?

As with many technological advances, it’s both. New tech, perhaps especially in medicine, spawns a great deal of initial hype, as the media reaches for sensational headlines and people speculate about potential ultimate applications. Reality never lives up to this hype, at least not initially, but in the post-hype phase the technology quietly improves in the background, sometimes even exceeding initial expectations. Also we learn what new technology can and cannot do, so expectations become much more realistic.

The same seems to be true for AI – as the hype has died down somewhat, the technology continues to advance, while researchers find new applications. We are finding that the recent crop of AI applications are particularly well-suited to certain tasks that have applications in the clinic. There are two functions that are particularly useful – AIs are very good at pattern recognition and at distilling down vast amounts of information into cogent summaries. How do these apply in medicine?

Pattern recognition, as any experienced clinician can tell you, is central to making diagnoses. There are essentially two approaches to diagnosis, intuition and analytical thinking. Clinicians build their intuition through experience, and get better at diagnosis over time as a result. They can recognize the gestalt of signs and symptoms of a disease, because they have seen it before.

But a diagnosis cannot end there. You have to back it up with specific analysis – exam findings, specific elements in the patient’s history, and laboratory testing. This often involves complex statistical thinking, such as knowing the predictive value of a specific laboratory findings on the probability of a diagnosis. Further, being a good diagnostician requires thorough knowledge, the ability to generate a complete differential diagnosis of all possibilities and prioritize them by probability.

AI, it turns out, is good at all three of these elements of being good at diagnosis – it can recognize patterns, do statistical analysis, and have consistently thorough relevant knowledge. What is needed is for AIs to be trained on large amounts of medical data. How do AI’s fair when tested against trained doctors? They do consistently well.

A recent study highlights this ability – looking at bronchopulmonary dysplasia (BPD) in infants. An artificial neural network was trained on the respiratory patterns of infants while they slept. The AI was able to correctly identify infants who had been diagnosed with BPD with 96% accuracy. The benefit here is that all it needed was to observe the infant breathing while sleeping for 10 minutes. This kind of subtle pattern recognition can potentially replace more invasive diagnostic testing.

AIs are also good at pulling signals out of noisy environments, such as brain activity. This has been a challenge of research and clinical use of functional MRI scanning, which looks at real time brain activity. There is simultaneous activity happening all the time, which makes it difficult to pull out the signal of interest. But AIs trained on this type of data are good at such tasks – recognizing specific patterns, then deleting those patterns from the data to see what remains. This has the potential to transform neuroscience research, and extend the clinical applicability of noisy diagnostic methods such as PET or fMRI.

The capability of AI goes beyond pattern recognition – large language models are good at simulating human reasoning through simulating language. Yes, there are problems, such as the potential for hallucinations, and a certain lack of artistic or creative flair. But AI should do well within the confines of a technical task, such as clinical decision-making, as long as it is properly trained on sufficient data. And – it does.

A recent study compared Chat GPT 4 with attending physicians and residents on a standard measure of clinical thinking – analyzing their way through 20 cases. The residents scored an 8/10, while attendings scored an average of 9/10. Chat GPT 4 scored 10/10.

In recent studies AI has been shown to outperform physicians in summarizing medial health records, and even was rated as having a better bedside manner.

We are not ready to take the human physician out of the loop quite yet. What AI’s lack is any genuine understanding, judgement, or reasoning ability. Anyone who has played around enough with one of the LLM chatbots understands this well. They are great at simulating human conversation, but don’t have real understanding, and are easily confounded. They can also confidently spout utter nonsense, with an inability to recognize their output as nonsense (OK, so not different from many people).

The risk here is that AIs can become a de facto oracle, in which their output will be trusted, either through a misunderstanding of their limitations or through laziness. For now the model is that AIs will be used as expert assistants – they are a tool for clinicians to use in their own decision-making. This way we get the best of both worlds – the wisdom, if you will, of human clinicians with the pattern recognition and analytical powerhouse of AI. But does this combination always work?

So far the evidence is extremely positive – AI expert medical systems improve clinician performance. However, there are some interesting exceptions. A recent study looking at using AI diagnostic tools to help radiologists read imaging studies found some mixed results. While performance generally increased, the performance of some radiologists actually decreased when using AI as a tool. The study was unable to determine why, but there was not consistent demographic trend (such as years of experience).

What this probably means is that you can’t just throw AI into the clinic and expect it to always work seamlessly. Like any new tool, clinicians need to learn how to optimally use it, which includes an understanding of how it works and its strengths and weaknesses. The same was true of all new diagnostic tools. MRI scans, for example, do not just spit out diagnoses with treatment plans. It is a powerful diagnostic tools, but doctors had to learn about the technology and how it functions and can best be incorporated into clinical practice. Clinicians, in other words, will need to become experts on medical AI.

This will not be a simple transition. We need to continue to develop AI systems specifically for medical use, explore the best ways to incorporate these tools into clinical practice, and train practitioners on their optimal use. But this is worth doing, in my opinion. The potential to increase the quality of medical care, reduce errors, and improve efficiency is massive.

Author

Steven Novella

Founder and currently Executive Editor of Science-Based Medicine Steven Novella, MD is an academic clinical neurologist at the Yale University School of Medicine. He is also the host and producer of the popular weekly science podcast, The Skeptics’ Guide to the Universe, and the author of the NeuroLogicaBlog, a daily blog that covers news and issues in neuroscience, but also general science, scientific skepticism, philosophy of science, critical thinking, and the intersection of science with the media and society. Dr. Novella also has produced two courses with The Great Courses, and published a book on critical thinking - also called The Skeptics Guide to the Universe.
View all posts

Categories

Tags

Archives

Update on AI in the Clinic

Author

Posted by Steven Novella